Skip to content

PoC faster LSMs with static calls/keys: ask for feedbacks#3

Closed
PaulRenauld wants to merge 15 commits intoqueue-2020from
poc-updated
Closed

PoC faster LSMs with static calls/keys: ask for feedbacks#3
PaulRenauld wants to merge 15 commits intoqueue-2020from
poc-updated

Conversation

@PaulRenauld
Copy link
Copy Markdown
Owner

@PaulRenauld PaulRenauld commented Jul 21, 2020

Copied from #2 (rebased after the static call branch on the queue was updated to contain the most recent changes on master):

This patch is to ask for comments and suggestions.

This uses static keys and static calls to get rid of the indirect calls for the LSMs.
For each LSM hook, there are 3 "slots" (the number can be changed). A slot corresponds to a static key, which indicates if this slot is used or not, and a static call, which would contain the function to call if the slot is used.
For example, for the hook file_permission, we will define:

DEFINE_STATIC_CALL(static_call_file_permission_1, LSM_FUNC_DEFAULT(file_permission));
DEFINE_STATIC_KEY_FALSE(static_check_file_permission_1);
DEFINE_STATIC_CALL(static_call_file_permission_2, LSM_FUNC_DEFAULT(file_permission));
DEFINE_STATIC_KEY_FALSE(static_check_file_permission_2);
DEFINE_STATIC_CALL(static_call_file_permission_3, LSM_FUNC_DEFAULT(file_permission));
DEFINE_STATIC_KEY_FALSE(static_check_file_permission_3);

Although the static key API provides an array definition, it is not the case for static calls, and it would be non-trivial to add it (code here). It makes it hard to provide a cleaner structure and we cannot use a for-loop to access the slots. Instead, we use a macro FOR_EACH_HOOK_SLOT(M, ...) which will expand.

We also need to keep in mind that not all functions use call_int_hook/call_void_hook. Some access the list directly. Therefore we still need to add the hooks to the linked list. Right now, we cannot have more LSMs on a specific hook than 3. If we want to remove the limit, we could add all the LSM to the hook list, but keep a pointer to the 3rd element. Then, in call_int_hook/call_void_hook, keep the iteration on the list, but start at index 3. This way, the first three LSMs are optimized, and the other have the same cost than before.


Tasks specific to the LSMs:

  • Remove the macro magic when adding a new LSM hook.
  • Adapt the number of LSM slots to the number of compiled LSMs.
  • Remove static keys.

Tasks for the static call API:

  • Add a conitionnal call for non-void function
    • Make this call able to evaluate its arguments
    • Extend this call to other configurations (without static call inline and without static calls)

create keys and add to static list

minimum working poc

wip

cleanup
for-loop like

struct slot

get rid of macros

use generic array of slots

cleanup
Copy link
Copy Markdown
Collaborator

@bjackman bjackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress! Added a few proper code review comments as well as general notes

With the addition of static_call_cond_int to the static call API, we
don't need static keys anymore. A default value is provided, which is
put into %eax before the call. This way, if the call is replaced by a
nop, the default value will be in the place of the returned value.

For now, this does not work when one of the arguments of the call is a
function call (eax will be overwritten), hence the modification to
security_mmap_file
Copy link
Copy Markdown
Owner Author

@PaulRenauld PaulRenauld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MVP is pretty much done and open to any comment

@PaulRenauld PaulRenauld force-pushed the poc-updated branch 2 times, most recently from 20f16ad to 3b16261 Compare August 4, 2020 13:28
Copy link
Copy Markdown
Owner Author

@PaulRenauld PaulRenauld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: we got rid of the static call API extension. Instead, we use a switch statement.

The linked list on the LSM hooks are copied into the slots by lsm_init_hook_static_slot. This will fill the slots from first_slot to the last (e.g.: if you have 10 slots and 3 functions, they will be in slots 7,8,9).
To call the hooks, the switch statement jumps to the first used slot, and then falls through to the next one.

Comment on lines +102 to +120
struct security_list_static_slots security_list_static_slots
__lsm_ro_after_init = {
#define DEFINE_SLOT(NUM, NAME) \
(struct security_static_slot) { \
.key = &STATIC_CALL_KEY(STATIC_SLOT(NAME, NUM)), \
.tramp = &STATIC_CALL_TRAMP(STATIC_SLOT(NAME, NUM)) \
},
#define LSM_HOOK(RET, DEFAULT, NAME, ...) \
.NAME = { \
.slots = { \
SECURITY_FOREACH_STATIC_SLOT(DEFINE_SLOT, NAME) \
}, \
.first_slot = INT_MAX, \
.head = &security_hook_heads.NAME \
},
#include <linux/lsm_hook_defs.h>
#undef LSM_HOOK
#undef DEFINE_SLOT
};
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also get rid of this struct (only keep the first_slot for each hook) and then use the X-macro in lsm_init_static_slots. This would save memory that is used only during the init, but make the code heavier.

* INT_MAX if no slot is used.
*/
int first_slot;
/* hlist corresponding to this hook */
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realised we should probably have a comment explaining why we still have the hlist as well as the static slots

int slot_idx, slot_cnt, first_slot;

slot_cnt = 0;
// todo: race condition if hlist modified in between the two foreach
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is impossible since this is all happening during init, no?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my understanding as well (I put the comment just to make sure). At least as long as we cannot just add and remove LSMs at abitrary time. The security_delete_hooks function might be annoying here.

} \
} while (0); \
RC; \
#define CALL_STATIC_SLOT_INT(NUM, R, HOOK, ...) \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a super unhygienic macro - I think that's unavoidable but let's put some __ at the beginning of the name so nobody mistakes it for a "proper" macro!


first_slot = SECURITY_STATIC_SLOT_COUNT - slot_cnt;
if (first_slot < 0)
panic("%s - No static hook slot remaining to add LSM hook.\n",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps instead of panicking when this happens, you could have a fallback path that places a pointer to the normal security_* helper function in the first slot and NOPs out the rest?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting recovery. Right now the idea is that we should have as many static call for each hook as there are LSMs in the kernel, so this wouldn't happen.

@PaulRenauld PaulRenauld requested a review from bjackman August 13, 2020 17:17
* MACRO(2, ...)
* ...
*/
#define M_LOOP_UNROLLING(N, MACRO, ...) \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think UNROLL_MACRO_LOOP would be a better name. Or just REPEAT_MACRO?

* Static slots are placeholders for potential LSM hooks.
* Instead of a costly indirect call, they use static calls.
*/
#define SECURITY_STATIC_SLOT_COUNT 3
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also now increase this to be 16 or whatever

@PaulRenauld
Copy link
Copy Markdown
Owner Author

PR #4 separates this RFC into clean commits.

PaulRenauld pushed a commit that referenced this pull request Aug 18, 2020
…kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm fixes for 5.8, take #3

- Disable preemption on context-switching PMU EL0 state happening
  on system register trap
- Don't clobber X0 when tearing down KVM via a soft reset (kexec)
PaulRenauld pushed a commit that referenced this pull request Aug 18, 2020
The vfio_pci_release call will free and clear the error and request
eventfd ctx while these ctx could be in use at the same time in the
function like vfio_pci_request, and it's expected to protect them under
the vdev->igate mutex, which is missing in vfio_pci_release.

This issue is introduced since commit 1518ac2 ("vfio/pci: fix memory
leaks of eventfd ctx"),and since commit 5c5866c ("vfio/pci: Clear
error and request eventfd ctx after releasing"), it's very easily to
trigger the kernel panic like this:

[ 9513.904346] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 9513.913091] Mem abort info:
[ 9513.915871]   ESR = 0x96000006
[ 9513.918912]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 9513.924198]   SET = 0, FnV = 0
[ 9513.927238]   EA = 0, S1PTW = 0
[ 9513.930364] Data abort info:
[ 9513.933231]   ISV = 0, ISS = 0x00000006
[ 9513.937048]   CM = 0, WnR = 0
[ 9513.940003] user pgtable: 4k pages, 48-bit VAs, pgdp=0000007ec7d12000
[ 9513.946414] [0000000000000008] pgd=0000007ec7d13003, p4d=0000007ec7d13003, pud=0000007ec728c003, pmd=0000000000000000
[ 9513.956975] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 9513.962521] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio hclge hns3 hnae3 [last unloaded: vfio_pci]
[ 9513.972998] CPU: 4 PID: 1327 Comm: bash Tainted: G        W         5.8.0-rc4+ #3
[ 9513.980443] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B270.01 05/08/2020
[ 9513.989274] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--)
[ 9513.994827] pc : _raw_spin_lock_irqsave+0x48/0x88
[ 9513.999515] lr : eventfd_signal+0x6c/0x1b0
[ 9514.003591] sp : ffff800038a0b960
[ 9514.006889] x29: ffff800038a0b960 x28: ffff007ef7f4da10
[ 9514.012175] x27: ffff207eefbbfc80 x26: ffffbb7903457000
[ 9514.017462] x25: ffffbb7912191000 x24: ffff007ef7f4d400
[ 9514.022747] x23: ffff20be6e0e4c00 x22: 0000000000000008
[ 9514.028033] x21: 0000000000000000 x20: 0000000000000000
[ 9514.033321] x19: 0000000000000008 x18: 0000000000000000
[ 9514.038606] x17: 0000000000000000 x16: ffffbb7910029328
[ 9514.043893] x15: 0000000000000000 x14: 0000000000000001
[ 9514.049179] x13: 0000000000000000 x12: 0000000000000002
[ 9514.054466] x11: 0000000000000000 x10: 0000000000000a00
[ 9514.059752] x9 : ffff800038a0b840 x8 : ffff007ef7f4de60
[ 9514.065038] x7 : ffff007fffc96690 x6 : fffffe01faffb748
[ 9514.070324] x5 : 0000000000000000 x4 : 0000000000000000
[ 9514.075609] x3 : 0000000000000000 x2 : 0000000000000001
[ 9514.080895] x1 : ffff007ef7f4d400 x0 : 0000000000000000
[ 9514.086181] Call trace:
[ 9514.088618]  _raw_spin_lock_irqsave+0x48/0x88
[ 9514.092954]  eventfd_signal+0x6c/0x1b0
[ 9514.096691]  vfio_pci_request+0x84/0xd0 [vfio_pci]
[ 9514.101464]  vfio_del_group_dev+0x150/0x290 [vfio]
[ 9514.106234]  vfio_pci_remove+0x30/0x128 [vfio_pci]
[ 9514.111007]  pci_device_remove+0x48/0x108
[ 9514.115001]  device_release_driver_internal+0x100/0x1b8
[ 9514.120200]  device_release_driver+0x28/0x38
[ 9514.124452]  pci_stop_bus_device+0x68/0xa8
[ 9514.128528]  pci_stop_and_remove_bus_device+0x20/0x38
[ 9514.133557]  pci_iov_remove_virtfn+0xb4/0x128
[ 9514.137893]  sriov_disable+0x3c/0x108
[ 9514.141538]  pci_disable_sriov+0x28/0x38
[ 9514.145445]  hns3_pci_sriov_configure+0x48/0xb8 [hns3]
[ 9514.150558]  sriov_numvfs_store+0x110/0x198
[ 9514.154724]  dev_attr_store+0x44/0x60
[ 9514.158373]  sysfs_kf_write+0x5c/0x78
[ 9514.162018]  kernfs_fop_write+0x104/0x210
[ 9514.166010]  __vfs_write+0x48/0x90
[ 9514.169395]  vfs_write+0xbc/0x1c0
[ 9514.172694]  ksys_write+0x74/0x100
[ 9514.176079]  __arm64_sys_write+0x24/0x30
[ 9514.179987]  el0_svc_common.constprop.4+0x110/0x200
[ 9514.184842]  do_el0_svc+0x34/0x98
[ 9514.188144]  el0_svc+0x14/0x40
[ 9514.191185]  el0_sync_handler+0xb0/0x2d0
[ 9514.195088]  el0_sync+0x140/0x180
[ 9514.198389] Code: b9001020 d2800000 52800022 f9800271 (885ffe61)
[ 9514.204455] ---[ end trace 648de00c8406465f ]---
[ 9514.212308] note: bash[1327] exited with preempt_count 1

Cc: Qian Cai <cai@lca.pw>
Cc: Alex Williamson <alex.williamson@redhat.com>
Fixes: 1518ac2 ("vfio/pci: fix memory leaks of eventfd ctx")
Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
PaulRenauld pushed a commit that referenced this pull request Aug 18, 2020
The `INSN_CONFIG` comedi instruction with sub-instruction code
`INSN_CONFIG_DIGITAL_TRIG` includes a base channel in `data[3]`. This is
used as a right shift amount for other bitmask values without being
checked.  Shift amounts greater than or equal to 32 will result in
undefined behavior.  Add code to deal with this.

Fixes: 33cdce6 ("staging: comedi: addi_apci_1032: conform to new INSN_CONFIG_DIGITAL_TRIG")
Cc: <stable@vger.kernel.org> #3.8+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20200717145257.112660-3-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
PaulRenauld pushed a commit that referenced this pull request Aug 18, 2020
The `INSN_CONFIG` comedi instruction with sub-instruction code
`INSN_CONFIG_DIGITAL_TRIG` includes a base channel in `data[3]`. This is
used as a right shift amount for other bitmask values without being
checked.  Shift amounts greater than or equal to 32 will result in
undefined behavior.  Add code to deal with this.

Fixes: 1e15687 ("staging: comedi: addi_apci_1564: add Change-of-State interrupt subdevice and required functions")
Cc: <stable@vger.kernel.org> #3.17+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20200717145257.112660-4-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
PaulRenauld pushed a commit that referenced this pull request Aug 18, 2020
Stalls are quite frequent with recent kernels. I enabled
CONFIG_SOFTLOCKUP_DETECTOR and I caught the following stall:

watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [cc1:22803]
CPU: 0 PID: 22803 Comm: cc1 Not tainted 5.6.17+ #3
Hardware name: 9000/800/rp3440
 IAOQ[0]: d_alloc_parallel+0x384/0x688
 IAOQ[1]: d_alloc_parallel+0x388/0x688
 RP(r2): d_alloc_parallel+0x134/0x688
Backtrace:
 [<000000004036974c>] __lookup_slow+0xa4/0x200
 [<0000000040369fc8>] walk_component+0x288/0x458
 [<000000004036a9a0>] path_lookupat+0x88/0x198
 [<000000004036e748>] filename_lookup+0xa0/0x168
 [<000000004036e95c>] user_path_at_empty+0x64/0x80
 [<000000004035d93c>] vfs_statx+0x104/0x158
 [<000000004035dfcc>] __do_sys_lstat64+0x44/0x80
 [<000000004035e5a0>] sys_lstat64+0x20/0x38
 [<0000000040180054>] syscall_exit+0x0/0x14

The code was stuck in this loop in d_alloc_parallel:

    4037d414:   0e 00 10 dc     ldd 0(r16),ret0
    4037d418:   c7 fc 5f ed     bb,< ret0,1f,4037d414 <d_alloc_parallel+0x384>
    4037d41c:   08 00 02 40     nop

This is the inner loop of bit_spin_lock which is called by hlist_bl_unlock in
d_alloc_parallel:

static inline void bit_spin_lock(int bitnum, unsigned long *addr)
{
        /*
         * Assuming the lock is uncontended, this never enters
         * the body of the outer loop. If it is contended, then
         * within the inner loop a non-atomic test is used to
         * busywait with less bus contention for a good time to
         * attempt to acquire the lock bit.
         */
        preempt_disable();
#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
        while (unlikely(test_and_set_bit_lock(bitnum, addr))) {
                preempt_enable();
                do {
                        cpu_relax();
                } while (test_bit(bitnum, addr));
                preempt_disable();
        }
#endif
        __acquire(bitlock);
}

After consideration, I realized that we must be losing bit unlocks.
Then, I noticed that we missed defining atomic64_set_release().
Adding this define fixes the stalls in bit operations.

Signed-off-by: Dave Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
PaulRenauld pushed a commit that referenced this pull request Aug 18, 2020
Ido Schimmel says:

====================
mlxsw fixes

This patch set contains various fixes for mlxsw.

Patches #1-#2 fix two trap related issues introduced in previous cycle.

Patches #3-#5 fix rare use-after-frees discovered by syzkaller. After
over a week of fuzzing with the fixes, the bugs did not reproduce.

Patch #6 from Amit fixes an issue in the ethtool selftest that was
recently discovered after running the test on a new platform that
supports only 1Gbps and 10Gbps speeds.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
PaulRenauld pushed a commit that referenced this pull request Aug 18, 2020
…kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm64 fixes for Linux 5.8, take #3

- Fix a corner case of a new mapping inheriting exec permission without
  and yet bypassing invalidation of the I-cache
- Make sure PtrAuth predicates oinly generate inline code for the
  non-VHE hypervisor code
PaulRenauld pushed a commit that referenced this pull request Aug 18, 2020
I compiled with AddressSanitizer and I had these memory leaks while I
was using the tep_parse_format function:

    Direct leak of 28 byte(s) in 4 object(s) allocated from:
        #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
        #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985
        #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140
        #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206
        #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291
        #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299
        #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849
        #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161
        #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207
        #9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786
        #10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285
        #11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369
        #12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335
        #13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389
        #14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431
        #15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251
        #16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284
        #17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593
        #18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727
        #19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048
        #20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127
        #21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152
        #22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252
        #23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347
        #24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461
        #25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673
        #26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

The token variable in the process_dynamic_array_len function is
allocated in the read_expect_type function, but is not freed before
calling the read_token function.

Free the token variable before calling read_token in order to plug the
leak.

Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
@PaulRenauld PaulRenauld mentioned this pull request Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants